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ABSTRACT 


A  general  method  is  deseribed  for  aehieving  super  resolution  imagery 
from  multiple  frame  image  sequenees  that  contain  motion.  The  method 
assumes  low-noise,  focal  plane  array  imagery  recorded  with  uncontrolled 
image  motion  that  can  included  some  random  jitter.  Using  this  approach, 
moderate  resolution,  fast  frame  image  sequenees  can  be  processed  to 
achieve  high-resolution  image  sequences  displayed  at  conventional 
frame  rates.  The  super  resolution  proeessing  depends  only  on  the 
imagery,  requiring  no  externally  eontrolled  micro-dither  or  a  priori 
information  such  as  the  sensor  motion  or  range  to  the  background. 
Typical  sensor  stabilization  requirements  are  relaxed  using  this  method, 
however,  to  achieve  optimum  performanee  there  are  some  constraints  on 
the  motion.  Specifically,  the  stabilization  must  still  good  enough  so  that 
the  resulting  random  dither  is  only  a  few  pixels  and  must  be  statistically 
well  behaved.  Processing  examples  are  given  using  previously  recorded 
image  sequences  from  a  wide  FOV  MWIR  staring  array  sensor  on-board 
an  aircraft. 


1.  Introduction 

The  concept  of  super  resolution  as  a  multi-frame  image  restoration  problem  has  been  known  for  a 
number  of  years.  *  A  number  of  innovations  to  this  restoration  technique  have  been  studied  to  improve  the 
performance.^  Nevertheless,  most  of  the  techniques  pertained  to  cases  in  which  the  image  motion  was 
known  or  the  motion  that  needed  to  be  determined  was  a  simple  global  shift  or  rotation.  For  many  sensor 
systems,  the  resulting  image  motion  can  not  be  described  as  a  simple  global  shift  but  can  only  be  fully 
described  by  the  optical  flow  function:  a  vector  field  quantity  describing  two  shift  dimensions  for  each 
pixel. 
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An  example  of  this  case  is  a  wide  FOV  sensor  on-board  an  aircraft  looking  down  at  moderate 
slant  angle.  The  upper  section  of  the  image  may  be  looking  toward  the  horizon  where  the  relative  motion 
is  very  small.  The  bottom  of  the  image  may  be  looking  at  the  nearby  terrain  moving  under  the  aircraft  and 
moving  relatively  fast. 

Assuming  that  the  optical  flow  for  each  pixel  can  be  accurately  calculated  and  the  sensor 
parameters  are  known  (such  as  optical  transfer  function,  FPA  geometry,  FPA  detector  response  function, 
and  FPA  noise),  then  a  super  resolution  image  can  be  computed  using  multi-frame  data.  The  method  used 
to  do  the  processing  builds  upon  sampling  theory  and  Wiener  filtering.  Later  in  this  paper  some  examples 
are  given  of  super  resolution  processing  applied  to  airborne  MWIR  image  sequences  taken  with  wide 
FOV  optics. 

2.  Super  Resolution  Using  Controlled  Micro-Dithering 

The  dimensions  of  a  focal  plane  array  does  not  necessarily  limit  the  spatial  resolution  of  a  sensor. 
A  sub-pixel  dither  scan  of  a  scene  will  generate  an  image  cube  which  can  be  used  to  reconstruct  a  single 
super  resolution  composite  with  spatial  resolution  above  the  Nyquist  limit  as  defined  by  the  detector 
center-to-center  spacing  on  the  FPA.  Figure  1  shows  a  simple  micro-dither  scan  pattern. 
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Figure  1:  Typical  controlled  micro-dither  scan  pattern  that  achieves  a  4  x  4  over-sampling. 


In  its  simplest  form,  micro-dithering  requires  a  perfectly  stable  sensor  and  accurate  mechanical 
control  of  a  scanning  device.  A  number  of  mechanical  devices  are  available  that  can  generate  high  speed 
precise  dithering  as  shown  in  Figure  1.  A  piezo-electric  driver  is  an  example  of  one  such  device.  Using 
this  technique  on  a  moving  platform  with  intrinsic  jitter  is  not  practical,  as  the  micro-dither  device  alone 
no  longer  controls  the  motion  of  the  image  on  the  focal  plane.  The  motion  of  the  aircraft  and  the  any 
associated  vibration  would  predominate  over  that  of  the  micro-dither  scan  pattern. 


Nevertheless,  it  is  instructive  to  examine  the  case  of  perfectly  known,  uniform  micro-dither  so 
that  sampling  issues  of  super  resolution  can  be  compared  against  interpolation.  Increasing  the  number  of 
pixels  in  a  digital  image  involves  changing  the  spatial  sampling  rate  of  the  image  on  the  focal  plane. 
Controlled  dithering  increases  the  sampling  rate  by  creating  new  sample  points  as  shown  in  Figure  1 . 
Alternatively,  values  at  the  new  sample  points  could  be  estimated  by  interpolation  of  a  single  frame. 
However,  such  new  samples  are  created  using  only  the  original  information  from  the  image.  Interpolation 
is  inherently  band  limited  and  adds  no  new  information  to  the  increased  bandwidth  afforded  by  a  higher 
sampling  rate  are  not  utilized.  This  is  shown  graphically  in  Figure  2  below. 
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Figure  2:  Comparison  of  interpolation  and  dithering  methods  to  increase  image  size.  Interpolation 
is  band  limited  to  the  information  of  a  single  frame  and  results  in  a  blurred  image.  Dithering 
increases  sampling  density  through  multiple  exposures.  Only  through  the  combination  of  multiple 
frames  can  sharper  features  emerge  that  were  not  observable  in  any  single  image. 

3.  Super  Resolution  Using  Random  Motion 

As  described  in  the  previous  section,  most  super  resolution  methods  rely  on  controlled  dither  scan 
patterns  followed  by  multi-frame  image  reconstruction.  A  more  utilitarian  approach  would  be  one  that 
accepted  small,  random  dither  shifts  between  images  in  the  multi-frame  sequence.  As  before,  the 
approach  still  requires  a  multi-frame  sequence  of  shifted  images,  but  the  requirement  for  a  precise, 
uniform  micro-dither  is  relaxed.  A  simple  description  of  the  approach  can  be  given  as  four  steps.  First, 
estimate  the  motion  of  the  image  in  terms  of  the  optical  flow  for  each  frame  and  each  pixel  in  a  multi 
frame  sequence.  Second,  assemble  the  resulting  multi-frame  data  into  a  single  super  resolution  image  on 
irregularly  spaced  sample  locations.  Third,  interpolate  the  irregularly  spaced  samples  onto  a  regularly 
spaced  grid.  Fourth,  compensate  For  known  blurring  effects  due  to  the  optics  and  detector  response 
function.  Each  step  is  discussed  separately  below. 


3.1  Motion  Estimation 

The  shift  between  two  frames  in  an  image  sequence  can  be  estimated  by  various  methods.  Generally,  this 
motion  will  not  be  uniform  across  an  image  and  requires  computing  the  optical  flow.  Such  a  computation  finds  a 
shift  for  every  pixel  between  two  sequential  frames.  A  demonstration  of  an  optical  flow  calculation  is  given  below  in 
Figure  3.  Our  particular  approach  used  phase  correlation  on  a  regional  basis  across  the  image  with  limited 
performance.’  This  was  improved  by  modeling  an  assmnption  that  for  well-behaved  backgrounds  and  image  motion, 
the  optical  flow  will  change  monotonically  across  each  frame  pair.  There  exist  in  the  general  literature  extensions  of 
this  technique  to  multi-scale  or  wavelet  based  implementations  that  promise  increased  computational  robustness  and 
efficiency. 


Initial  frame  Subsequent  Frame  tlfference 


100 


80 

60 

40 

20 


Regulan’zed  Optical  Flow 

Figure  3:  Sequential  imagery  from  50  Hz  wide  FOV  128x128  airborne  imagery.  The  algebraic 
difference  between  frames  demonstrates  motion  non-uniformity  across  the  scene.  Regional  (32x32) 
shift  estimators  suffer  from  poor  performance,  but  simple  model  fitting  provides  markedly 
improved  motion  estimation. 

Note  that  in  Figure  3  the  motion,  or  optical  flow  is  greatest  at  the  bottom  of  the  image  and  very  small  near  the  top  of 
the  image.  In  fact,  it  can  be  inferred  from  the  optical  flow  plot  that  the  aircraft  is  flying  toward  a  point  in  the  image 
defined  by  zero  optical  slow. 
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Optical  Flow  torn  Local  Phase  Correlation 


Local  Phase  Correlation 


3.2  Assembling  a  Single  Super  Resolution  Image  from  Multi-frame  Data 

With  precise  estimates  of  the  optical  flow  for  each  frame  of  data,  the  original  image  sequence  is  combined 
into  a  single  super  resolution  image  by  assembling  pixels  based  on  their  resulting  coordinates.  Note  that  in  doing  so 
under  random  motion,  the  true  pixel  coordinates  no  longer  lie  on  a  rectilinear  grid,  yielding  a  non-uniform  sampling 
of  the  optical  image  projected  at  the  focal  plane.  A  hypothetical  example 


of  random  dither  points  with  respect  to  the  original  unit  cell  and  the  fine  mesh  super  resolution  grid  is 
given  below  in  Figure  4. 
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Figure  4:  Graphical  demonstration  of  assembling  a  3-D  spatial-temporal  image  deck  into  a 
composite  given  coordinate  knowledge  for  every  pixel  in  space  and  time.  This  knowledge  can  be 
successfully  derived  from  motion  analysis  of  the  digital  video,  and  requires  no  external  control  or 
cues. 


Note  that  the  distribution  of  points  in  Figure  4  characterize  a  relatively  good  distribution  of 
sample  points  in  space;  That  is  for  each  fine  mesh  points  on  the  regular  grid  (denoted  by  the  open  circles) 
there  are  several  candidate  dither  points  (denoted  by  the  X’s)  in  the  immediate  neighborhood.  These 
nearby  dither  points  can  then  be  used  to  estimate  intensities  sampled  at  the  regular  lattice  points.  The 
actual  distribution  of  dither  points  is  not  guaranteed  to  provide  good  cover  over  all  parts  of  the  image.  For 
example,  Figure  5  shows  four  different  distribution  from  each  comer  of  the  same  image  sequence.  Note 
that  the  two  bottom  comers  are  fairly  well  distributed.  However  the  top  left  comer  suffers  from  too  little 
motion  and  top  right  comer  suffers  because  the  motion  is  restrict  to  be  primarily  in  the  horizontal 
direction. 


Figure  5:  Pixel  drifts  for  16  frames  of  50  Hz  airborne  wide  FOV  video.  Note  the  pronounced 
regional  dependence  of  drift  motion  during  this  episode.  Pixels  in  the  foreground  drift  across  3 
coarse  lattice  sites,  providing  good  coverage  of  fine  lattice  sites  for  super  resolution.  Other  pixels 
drift  minimally  and  offer  little  additional  information  than  that  of  a  single  frame.  Image  quality  of 
the  super  resolution  composite  tracks  with  the  suitability  of  pixel  drifts. 


3.3  Regularizing  the  Super  Resolution  Image 

Our  approach  is  tolerant  that  the  dither  coordinates  of  the  assembled  composite  do  not  lie  on  a 
rectilinear  grid.  This  requires  converting  a  non-uniformly  sampled  optical  image  to  a  regular  sample  array 
through  interpolation.  Our  technique  used  a  bilinear  method  based  on  the  four  nearest  candidate  dither 
intensities. 

All  interpolation  techniques  distort  signal  quality;  such  distortion  degrades  with  increasing 
geometric  distance  between  known  data  positions  and  estimated  data  positions.  In  the  case  of  sufficiently 
high  dithering  density  with  small  geometric  interpolation  distances,  this  step  can  be  skipped  when 
interpolation  errors  would  be  on  the  order  of  the  sensor  noise. 

3.4  Recovering  from  known  blurring  effects 

An  image  sampled  by  a  focal  plane  array  suffers  from  hea^^  blurring  due  spatial  integration  of 
each  detector  and  the  blur  associated  with  resolution  limited  optics.”^  The  high  spatial  frequencies  lost 
from  blurring  can  be  partially  recovered  by  a  restoration  filter.  In  the  derivation  below,  a  simple  model  of 
optical  and  detector  blurring  is  used  to  compute  a  de-convolution  filter.  We  first  model  the  optical  image 
as  a  geometric  prqjection  of  a  real  world  scene  onto  a  focal  plane 


Next,  we  model  the  effect  of  OTF  blur  and  spatial  integration  of  the  detector  as  linear  operators  acting  on 
the  geometric  image  to  generate  a  pseudo  image. 
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The  effect  of  detector  spatial  integration  can  be  modeled  as  convolution  with  a  boxcar  function. 
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The  optical  system  can  be  modeled  with  resolution  limited  spatial  frequency  response 
Hqtf  (^  ,  ^2 )  =  rect{^)  rect{^) 
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Sampling  the  pseudo  image  by  a  regular  lattice  of  detectors  creates  a  discrete  space  response  related  to  the 
continuous  space  response  by 

Both  the  continuous  space  and  discrete  space  image  can  be  characterized  by  their  respective  Fourier 
transforms 

Sp{F,,F2,t)  =  J  J  SpiXi^Zid)  e c/x,  dx2 
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The  continuous  space  Fourier  transform  and  discrete  space  Fourier  series  are  related  by 
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For  an  array  of  detectors  with  size  a  and  pitch  b,  we  present  graphical  relationship  between  space  and 
spatial  frequency  domains 
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Figure  6:  Relationship  between  the  spatial  response  of  an  ideal  integrating  FPA  and  the 
corresponding  modulation  in  spatial  frequency  space 

Ignoring  optical  blur,  a  100%  fill  factor  focal  plane  array  corrupts  the  sampled  image  with  heavy 
alias  distortion  as  shown  in  Figure  7  (upper  left).  By  applying  diffraction-limited  optics  with  at  least  2 
pixels  per  blur  circle,  alias  errors  are  eliminated  at  the  cost  of  reduced  spatial  resolution  (lower  left). 
Increasing  the  sampling  rate  through  dithering  reduces  alias  error  by  further  separating  the  discrete  spatial 
frequency  images  of  the  pixel  blur  function  (upper  right).  Applying  a  diffraction  limiting  optic  of  smaller 
blur  circle  can  again  eliminate  alias  distortion  (lower  right). 


Figure  7:  (Upper  Left)  Spatial  frequency  of  a  100%  fill  factor  FPA 

(Lower  Left)  100%  fill  factor  FPA  with  Nyquist  blurring  optic 
(Upper  Right)  3-fold  dithered  100%  fill  factor  FPA 

(Lower  Right)  IT-fold  dithered  100%  fill  factor  FPA  with  Nyquist  blurring  optic 


Dithered  imagery  still  suffers  from  heavy  blurring  due  to  detector  spatial  integration.  Such  linear 
effects  can  be  optimally  restored  with  a  Wiener  filter  given  the  spectra  of  image  clutter  and  sensor  noise. 
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Figure  8:  Ideal  distortion  modeling  of  a  100%  fill  factor  dithered  FPA  with  Nyquist  optic.  On  the 
right  is  the  optimal  least  squares  (Wiener)  restoration  filter  given  a  10:1  SNR.  Dithering  allows 
recovery  of  pixel  blurring  because  alias  distortion  is  pronouncedly  reduced 
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Figure  9:  Modeling  of  tbe  spatial  frequency  response  and  resultant  imagery,  demonstrating 
interpolation,  dithering,  and  dithering  followed  with  Wiener  filter  restoration. 


the  sensor  is  sufficiently  dithered  that  the  alias  distortion  is  on  the  same  order  as  the  sensor  noise. 
Depending  on  OTF  blur  and  clutter-to-noise  spectra,  meaningful  dithering  is  roughly  limited  to  a 
resolution  of  the  focal  plane  array  dimensions  multiplied  by  the  signal  to  noise  ratio. 

4.  Results 

Our  procedure  was  implemented  on  NRL  airborne  mid  wave  inlrared  data  taken  August  1995. 
The  original  digital  video  was  wide  FOV,  128x128  pixels  clocked  at  50  Ifames/second.  Our 
demonstration  integrated  1 6  frames  of  data  using  only  the  ambient  aircraft  motion  to  dither  the  sensor. 
The  performance  of  our  super  resolution  algorithms  heavily  depends  on  ambient  motion,  which  varies 
dramatically  across  the  field  of  view  of  the  sensor.  Enhancements  and  limitations  are  clearly  observable 
the  resultant  imagery  presented  in  figures  10-13,  and  closely  track  with  local  scene  drifts.  Every  figure 
shows  matching  pairs  of  64x64  cropped  and  zoomed  imagery  that  demonstrate  the  performance  of 
interpolation  and  super  resolution  methods  compared  with  the  original  low  resolution  scene. 
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Figure  10:  Comparison  of  4-fold  interpolation  and  4X  super  resolution  of  a  64x64  image 
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Figure  11:  Comparison  of  4-fold  interpolation  and  4X  super  resolution  of  a  64x64  image 
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Figure  12:  Comparison  of  4-fold  interpolation  and  4X  super  resolution  of  a  64x64  image 
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Figure  13:  Comparison  of  4-fold  interpolation  and  4X  super  resolution  of  a  64x64  image 


5.  Summary 


A  general  technique  is  presented  for  generating  higher  resolution  imagery  from  lower  resolution  image 
sequences  containing  uncontrolled  scene  motion.  This  technique  estimates  scene  motion  from  a  temporal 
optical  flow  calculation.  Knowledge  of  pixel  motion  and  intensities  allows  for  the  assimilation  of  a 
composite  image  of  higher  resolution  than  any  single  frame.  Minor  modifications  to  the  data  set  account 
for  the  non-uniform  spatial  sampling  intrinsic  with  uncontrolled  motion,  the  performance  of  which 
depends  on  the  stochastic  distribution  of  the  motion.  A  final  restoration  filter,  dependent  on  sensor 
parameters  and  noise  performance,  is  applied  to  the  data  set  to  provide  an  optimal  recovery  of  known 
distortions  intrinsic  to  super  resolution.  This  process  is  demonstrated  on  50  Hz  midwave  infrared  imagery 
from  a  hard-mounted  staring  sensor  flown  around  3,000  feet  altitude.  The  wide  range  of  situational 
motion  across  the  scene  variably  impacted  final  performance  of  the  algorithm. 
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